Skip to content

magamba/linear-regions

Repository files navigation

Are All Linear Regions Created Equal?

This repository contains the code to reproduce the experiments from the paper "Are All Linear Regions Created Equal?".

Set up

To install all required dependencies, run:

pip install -r requirements.txt

Configure environment

To configure the computing environment for running the experiments, you can set the following environment variables

E_SAVE_DIR="/path/to/savedir" # basedir where to store results
E_NAME="exp_name" # name of the experiment run
E_DATA_DIR="/path/to/dataset/dir" # path where to load data from
E_DEVICE="cuda" # "cpu" or "cuda"
E_WORKERS=1 # number of CPU worker processes

or use the following command line arguments with train.py, compute_stats.py and aggregate_stats.py:

 --e_save-dir="/path/to/savedir"
 --e_name="exp_name"
 --e_data-dir="/path/to/dataset/dir"
 --e_device="cuda"
 --e_workers="1"

Train a model

To train a model, run

python train.py --e_device=cuda --e_name='test' --e_save-dir='checkpoints' --e_data-dir="./data" --e_workers=4 --data="cifar10" --model="vgg8" --epochs=300 --batch-size=128 --augmentation --seed=42 --train-split=49000 --val-split=1000 --eval-every=10

model checkpoints for the corresponding run will be compressed to a single zip archive, and stored to ./checkpoints/test/vgg8/cifar10/augmentation/seed-42/checkpoints.zip

Compute statistics

To estimate linear region density and absolute deviation along data-driven paths, checkpoints of trained models can be loaded, by specifying a network architecture, dataset, training seed, checkpoint and dataset split:

python compute_stats.py --e_device="cuda" --e_name=E_NAME --e_save-dir=E_SAVE_DIR --e_workers=1 --e_data-dir=E_DATA_DIR --data=cifar10 --model=vgg8 --augmentation --seed 42 --l_load-checkpoints 1 --train-split=49000 --val-split=1000 --l_gen-strategy="closed-path-train" --l_num-paths=1024 --l_buff-size=30000 --l_closed-path-radius=4 --batch-size=128 --l_num-anchors=8

Results are stored in uncompressed json format to E_SAVE_DIR/E_NAME/MODEL/DATA/TRAINING_SETTING/seed-SEED/OUT_NAME-BATCH_ID.json, where BATCH_ID denotes the statistics computed for batch number BATCH_ID. The size of the json files produced depends on the number of linear regions discovered, and can be in the order of several Gigabytes for large networks, or if many paths are generated. It is advised to compress json files if running several experiments.

Aggregate statistics

Finally, if multiple json files are generated by a single experimental run, to aggregate statistics into a single json file that can be used for plotting, run:

python aggregate_stats.py --model MODEL --data DATA --npaths NPATHS --load-from LOAD_FROM.txt --output OUTPUT --checkpoint-id CHECKPOINT_ID --dataset-split DATASET_SPLIT

where LOAD_FROM.txt is a plaintext file listing the json results generated by compute_stats.py, with each path to a json file separated by a newline character.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages